03:00
This slide deck was built in Quarto!
So far, you’ve:
::: notes One way you can work with the data that you’ve groomed and selected from BigQuery is by exporting it to use elsewhere. Depending on the size of the data and how often the data changes, this might be a great use case, and very convenient. Let’s go over that briefly.
Now, you might think that the Export button as you’re looking at a table would be a great way to get data out of GCP, but this export button is actually for exporting to other Google products, like Looker Studio or Google Cloud Storage. If that’s what you want, fantastic, but let’s assume you want to take a .csv and just run with that data somewhere else. What should you do?
To export a .csv, you need to have query results, and you can save those in various formats. So we have a couple of options. :::
Since you’ve saved the query you used to create this table, you can re-run that query.
Or, since you’ve created a table that contains all the data you want, you run a “give me everything” query:
SELECT * FROM [table_name]
Run a query – either your saved query or a “SELECT * FROM…” on your saved table.
Then Click the Save Results button and save the .csv to your computer.
03:00
Quick reminder: R is a language, that can be run in many settings:
RStudio is a fully featured IDE that runs on Linux, Windows, and Mac. It’s much more heavy weight than just the language.
Many of us love RStudio. You can spin up an RStudio server in GCP… but you don’t have to, and it might mean spending money when you don’t need to.
You can use your normal RStudio (on your computer or in Posit.cloud) to work with BigQuery.
Please open whatever RStudio you typically use. https://posit.cloud is a good option!
When you get there, open a new Quarto or R Markdown document and run install.packages('bigrquery') and also install tidyverse if you haven’t already.
05:00
There are several ways to connect to BigQuery (see, e.g. https://solutions.posit.co/connections/db/databases/big-query/) but the way I’m going to show you uses a library called bigrquery.
Please install bigrquery as well as tidyverse, if you don’t already have that installed.
Then you’re going to start a new Quarto or R Markdown document and add a code chunk that loads these two libraries:
Now you need to authenticate so that RStudio can connect to BigQuery. You’ll next type
bq_auth()
Use the same Google Identity you’re using for GCP. If it suggests installing httpuv, do it, as otherwise authorization might fail.
You might or might not see this – you might get this code passed back automatically with a message that says
Authentication complete. Please close this page and return to R.
::: notes
OK, so now you’ve seen what this looks like, I’m going to do a screen share, but I’m going to stop sharing before it shares my authentication code, so you’ll just have to imagine those last steps. :::
The pattern is:
project_id and my_sql_query objectsresults <- bq_project_query(project_id, my_sql_query)df <- bq_table_download(results)Visit https://github.com/pm0kjp/rmedicine_2024_bigquery/blob/main/bq_demo.qmd and get the code, but we’ll also do this together bit by bit.
::: notes
Okay, so let me walk you through what this will be like. I’m going to make a new code chunk, and create two new objects, one of which will hold my project id, which I can get from my GCP project dashboard, and one of which will hold my SQL query. Let’s do that together.
Then, once I have my query and my project ID set, I’m ready to use bigrquery commands to run that query, and then to retrieve the data. Let’s add that code to a new code chunk, and run it.
And now, in RStudio, I have this data in my environment. Let’s take a look at that data by using View.
# Vertex AI Workbench
{.bordered}{.bordered}
::: note
In the burger menu, select Vertex AI, then Workbench. Enable the api.
:::
# Create a Workbench Instance
{.bordered}
Important to shut down: <https://cloud.google.com/vertex-ai/docs/workbench/instances/shut-down>
::: notes
You'll need to create a workbench instance. This is going to use compute resources that cost money, so it's important to remember to shut down this instance when you're not using it. <https://cloud.google.com/vertex-ai/docs/workbench/instances/shut-down>. Default settings include a 3 hour idle timer, which is useful!
:::
# Setting up R kernel {.smaller}
In a new Terminal in your Jupyter instance, enter the following:
conda create -n r conda activate r conda install -c r r-essentials conda install -c r r-tidyverse conda install -c r r-bigrquery
Answer "y" when prompted.
Once all this is installed, issue this command:
conda activate r ``` Answer “y” when prompted.
File > New > Notebook
Select “R” as the kernel
Joy Payton, Children’s Hospital of Philadelphia